ANN with MNIST
Table of Contents
From Wikipedia
More here
We will be using MNIST to create a Multinomial Classifier that can detect if the MNIST image shown is a member of class 0,1,2,3,4,5,6,7,8 or 9. Susinctly, we're teaching a computer to recognize hand written digets.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
Let's download and load the dataset.
mnist = tf.keras.datasets.mnist
(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
print ("The training data set is:\n")
print (train_x.shape)
print (train_y.shape)
print ("The test data set is:")
print (test_x.shape)
print (test_y.shape)
Display a few random samples from it:
# well, that's not a picture (or image), it's an array.
train_x[5].shape
You might think the training set is made up of 28 $\times$28 grayscale images of handwritten digits. No !!!
The thing is, the image has been flattened. These are 28x28 images that have been flattened into a 1D array. Let's reshape one.
img = np.reshape(train_x[5], (28,28))
img = train_x[5].reshape(28,28)
# So now we have a 28x28 matrix, where each element is an intensity level from 0 to 1.
img.shape
Let's visualize what some of these images and their corresponding training labels look like.
plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
train_y[5]
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
img = train_x[5].reshape(28,28)
plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
First, the layer performs several matrix multiplication to produce a set of linear activations
Second, each linear activation is running through a nonlinear activation function
Third, predict values with an affine transformation
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape = (28, 28)),
tf.keras.layers.Dense(units = 100, activation = 'relu'),
tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
Loss
Optimizer
n_batch: batch size for mini-batch gradient descentn_iter: the number of iteration steps per epochn_epoch: iteration over the entire x and y data providedInitializer
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
# Train Model
loss = model.fit(train_x, train_y, epochs = 5)
# Evaluate Test Data
test_loss, test_acc = model.evaluate(test_x, test_y)
test_img = test_x[np.random.choice(test_x.shape[0], 1)]
predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)
plt.figure(figsize = (12,5))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()
print('Prediction : {}'.format(mypred[0]))
You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of overfitting, when a machine learning model performs worse on new data than on its training data.
What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...
$\Rightarrow$ As we saw in lecture, convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will build a CNN and ultimately output a probability distribution over the 10 digit classes (0-9) in the next lectures.
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')